155 research outputs found

    A complexity analysis of statistical learning algorithms

    Full text link
    We apply information-based complexity analysis to support vector machine (SVM) algorithms, with the goal of a comprehensive continuous algorithmic analysis of such algorithms. This involves complexity measures in which some higher order operations (e.g., certain optimizations) are considered primitive for the purposes of measuring complexity. We consider classes of information operators and algorithms made up of scaled families, and investigate the utility of scaling the complexities to minimize error. We look at the division of statistical learning into information and algorithmic components, at the complexities of each, and at applications to support vector machine (SVM) and more general machine learning algorithms. We give applications to SVM algorithms graded into linear and higher order components, and give an example in biomedical informatics

    On the probabilistic continuous complexity conjecture

    Full text link
    In this paper we prove the probabilistic continuous complexity conjecture. In continuous complexity theory, this states that the complexity of solving a continuous problem with probability approaching 1 converges (in this limit) to the complexity of solving the same problem in its worst case. We prove the conjecture holds if and only if space of problem elements is uniformly convex. The non-uniformly convex case has a striking counterexample in the problem of identifying a Brownian path in Wiener space, where it is shown that probabilistic complexity converges to only half of the worst case complexity in this limit

    On Some Integrated Approaches to Inference

    Full text link
    We present arguments for the formulation of unified approach to different standard continuous inference methods from partial information. It is claimed that an explicit partition of information into a priori (prior knowledge) and a posteriori information (data) is an important way of standardizing inference approaches so that they can be compared on a normative scale, and so that notions of optimal algorithms become farther-reaching. The inference methods considered include neural network approaches, information-based complexity, and Monte Carlo, spline, and regularization methods. The model is an extension of currently used continuous complexity models, with a class of algorithms in the form of optimization methods, in which an optimization functional (involving the data) is minimized. This extends the family of current approaches in continuous complexity theory, which include the use of interpolatory algorithms in worst and average case settings

    Relationships among Interpolation Bases of Wavelet Spaces and Approximation Spaces

    Full text link
    A multiresolution analysis is a nested chain of related approximation spaces.This nesting in turn implies relationships among interpolation bases in the approximation spaces and their derived wavelet spaces. Using these relationships, a necessary and sufficient condition is given for existence of interpolation wavelets, via analysis of the corresponding scaling functions. It is also shown that any interpolation function for an approximation space plays the role of a special type of scaling function (an interpolation scaling function) when the corresponding family of approximation spaces forms a multiresolution analysis. Based on these interpolation scaling functions, a new algorithm is proposed for constructing corresponding interpolation wavelets (when they exist in a multiresolution analysis). In simulations, our theorems are tested for several typical wavelet spaces, demonstrating our theorems for existence of interpolation wavelets and for constructing them in a general multiresolution analysis

    On the average uncertainty for systems with nonlinear coupling

    Full text link
    The increased uncertainty and complexity of nonlinear systems have motivated investigators to consider generalized approaches to defining an entropy function. New insights are achieved by defining the average uncertainty in the probability domain as a transformation of entropy functions. The Shannon entropy when transformed to the probability domain is the weighted geometric mean of the probabilities. For the exponential and Gaussian distributions, we show that the weighted geometric mean of the distribution is equal to the density of the distribution at the location plus the scale, i.e. at the width of the distribution. The average uncertainty is generalized via the weighted generalized mean, in which the moment is a function of the nonlinear source. Both the Renyi and Tsallis entropies transform to this definition of the generalized average uncertainty in the probability domain. For the generalized Pareto and Student's t-distributions, which are the maximum entropy distributions for these generalized entropies, the appropriate weighted generalized mean also equals the density of the distribution at the location plus scale. A coupled entropy function is proposed, which is equal to the normalized Tsallis entropy divided by one plus the coupling.Comment: 24 pages, including 4 figures and 1 tabl

    Use of the geometric mean as a statistic for the scale of the coupled Gaussian distributions

    Full text link
    The geometric mean is shown to be an appropriate statistic for the scale of a heavy-tailed coupled Gaussian distribution or equivalently the Student's t distribution. The coupled Gaussian is a member of a family of distributions parameterized by the nonlinear statistical coupling which is the reciprocal of the degree of freedom and is proportional to fluctuations in the inverse scale of the Gaussian. Existing estimators of the scale of the coupled Gaussian have relied on estimates of the full distribution, and they suffer from problems related to outliers in heavy-tailed distributions. In this paper, the scale of a coupled Gaussian is proven to be equal to the product of the generalized mean and the square root of the coupling. From our numerical computations of the scales of coupled Gaussians using the generalized mean of random samples, it is indicated that only samples from a Cauchy distribution (with coupling parameter one) form an unbiased estimate with diminishing variance for large samples. Nevertheless, we also prove that the scale is a function of the geometric mean, the coupling term and a harmonic number. Numerical experiments show that this estimator is unbiased with diminishing variance for large samples for a broad range of coupling values.Comment: 17 pages, 5 figure

    Radial bounds for perturbations of elliptic operators

    Get PDF
    AbstractElliptic operators A = āˆ‘Ā¦Ī±Ā¦ ā©½ m bĪ±(x) DĪ±, Ī± a multi-index, with leading term positive and constant coefficient, and with lower order coefficients bĪ±(x) Ļµ LrĪ± + LĪ± (with (nrĪ±) + Ā¦Ī±Ā¦ < m) defined on Rn or a quotient space RnRnUĪ±, UĪ±āŠ‚ Rn are considered. It is shown that the Lp-spectrum of A is contained in a ā€œparabolic regionā€ Ī© of the complex plane enclosing the positive real axis, uniformly in p. Outside Ī©, the kernel of the resolvent of A is shown to be uniformly bounded by an L1 radial convolution kernel. Some consequences are: A can be closed in all Lp (1 ā©½ p ā©½ āˆž), and is essentially self-adjoint in L2 if it is symmetric; A generates an analytic semigroup eāˆ’tA in the right half plane, strongly Lp and pointwise continuous at t = 0. A priori estimates relating the leading term and remainder are obtained, and summability Ļ†(ĪµA)ʒā†’Īµ ā†’ 0Ļ†(0) ʒ, with Ļ† analytic, is proved for ʒ Ļµ Lp, with convergence in Lp and on the Lebesgue set of ʒ. More comprehensive summability results are obtained when A has constant coefficients

    Transcription factor-DNA binding via machine learning ensembles

    Full text link
    The network of interactions between transcription factors (TFs) and their regulatory gene targets governs many of the behaviors and responses of cells. Construction of a transcriptional regulatory network involves three interrelated problems, defined for any regulator: finding (1) its target genes, (2) its binding motif and (3) its DNA binding sites. Many tools have been developed in the last decade to solve these problems. However, performance of algorithms for these has not been consistent for all transcription factors. Because machine learning algorithms have shown advantages in integrating information of different types, we investigate a machine-based approach to integrating predictions from an ensemble of commonly used motif exploration algorithms.Published versio
    • ā€¦
    corecore